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Within hosts, Human T-Lymphotropic Virus Type-1 
(HTLV-1) is spread through de novo infection and 
infected cell proliferation, producing multiple T cell 
clones (infected cells with the same genomic proviral 
integration site). Between hosts, the number of clones 
observed from a lOug sample of DNA varies by up to 
three orders of magnitude. The question arises: what is 
the total number of clones in the host from which that 
sample was drawn? Considering each clone as a "spe- 
cies", the question becomes analogous to the "unseen 
species problem" in population ecology. We tested four 
species richness (number of species) estimators, and a 
novel approach, "DivE", using three independent data- 
sets: (i) viral populations from patients infected with 
HTLV-1, (ii) T cell antigen receptor clonotype reper- 
toires, and (iii) microbial data from infant faecal sam- 
ples. In all datasets, DivE was substantially more 
accurate than the ecological estimators, which were 
strongly biased by sample size when applied to datasets 
where the majority of species was not already present. 
DivE can also be used to estimate with accuracy the 
population clone structure from small samples. Previous 
estimates of HTLV-1 clone diversity in vivo were in the 
order of 102, and have increased in line with method 
sensitivity. In contrast, the mean estimated number of 
clones in the circulation of a single host (asymptomatic 
carriers and patients with chronic inflammation) by 
DivE was more than two logs higher than previously 
estimated. These estimates will inform our understand- 
ing of the dynamics and pathogenesis of HTLV-1 
infection. 
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